HTML API: Preserve decoder match length on named-reference miss by sirreal · Pull Request #66 · sirreal/wordpress-develop

sirreal · 2026-06-12T21:14:00Z

What

Fixes WP_HTML_Decoder::read_character_reference() so unmatched named character references preserve the by-reference match length value.

Issue

WP_Token_Map::read_token() returns null when no token matches. The decoder checked for false instead. On an unmatched named reference, that allowed null to flow into the later semicolonless-reference path, where the decoder could calculate a non-zero match length even though no character reference was matched.

Callers use the by-reference match length to advance through a string only when a reference is actually found. A miss must return null and leave the supplied match length untouched.

Reproduction

On trunk, a miss in data context alters the by-reference length:

$match_byte_length = "sentinel";
$result = WP_HTML_Decoder::read_character_reference( "data", "&bogus;", 0, $match_byte_length );

var_dump( $result );
var_dump( $match_byte_length );

Expected:

NULL
string(8) "sentinel"

Actual on trunk:

NULL
int(1)

The previously shown attribute-context &bogus; demo does not reproduce this bug because the attribute ambiguity branch returns before mutating the match length. The underlying contract still applies to both contexts: a failed match should not set $match_byte_length.

Fix

Check null === $replacement after WP_Token_Map::read_token(), matching the token-map API contract.

Validation

vendor/bin/phpunit --filter test_unmatched_named_character_reference_does_not_set_match_byte_length tests/phpunit/tests/html-api/wpHtmlDecoder.php

Result: OK, 4 tests, 8 assertions.

Trac ticket: TBD

Use of AI Tools

AI assistance: Yes
Tool(s): Codex
Model(s): GPT-5
Used for: splitting the fuzzer-discovered fix into a focused PR, drafting reproduction notes, and running validation. Final implementation was reviewed against the branch diff.

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

github-actions · 2026-06-12T22:14:17Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Preserve decoder match length on named-reference miss

9b47c0c

sirreal marked this pull request as ready for review June 12, 2026 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML API: Preserve decoder match length on named-reference miss#66

HTML API: Preserve decoder match length on named-reference miss#66
sirreal wants to merge 1 commit into
trunkfrom
fix/html-decoder-token-map-null

sirreal commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sirreal commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Issue

Reproduction

Fix

Validation

Use of AI Tools

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sirreal commented Jun 12, 2026 •

edited

Loading